feat: add range tombstones (delete_range / delete_prefix)#21
Conversation
- Add RangeTombstone, ActiveTombstoneSet, IntervalTree core types (ported from upstream PR fjall-rs#242 algorithms, own SST persistence) - Add RangeTombstoneFilter for bidirectional iteration suppression - Integrate into Memtable with interval tree for O(log n) queries - Add remove_range(start, end, seqno) and remove_prefix(prefix, seqno) to AbstractTree trait, implemented for Tree and BlobTree - Wire suppression into point reads (memtable + sealed + SST layers) - Wire RangeTombstoneFilter into range/prefix iteration pipeline - SST persistence: raw wire format in BlockType::RangeTombstone block - Flush: collect range tombstones from sealed memtables, write to SSTs - Compaction: propagate RTs from input to output tables with clipping - GC: evict range tombstones below watermark at bottom level - Table-skip: skip tables fully covered by a range tombstone - MultiWriter: clip RTs to each output table's key range on rotation - Handle RT-only tables (derive key range from tombstone bounds) - 16 integration tests covering all paths Closes #16
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds end-to-end range-tombstone support: new RangeTombstone primitives, memtable interval index and active trackers, RT-aware iterators and point-read suppression, on-disk RT persistence (format V4), RT propagation through flush/compaction/writers (including RT-only tables), AbstractTree sealing and delete APIs, plus tests and recovery/version changes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Tree
participant ActiveMemtable
participant SealedMemtables
participant Flusher
participant SST
Client->>Tree: remove_range(start,end,seqno)
Tree->>ActiveMemtable: insert_range_tombstone(start,end,seqno)
ActiveMemtable->>ActiveMemtable: store in IntervalTree
Client->>Tree: get(key, read_seqno)
Tree->>Tree: is_suppressed_by_range_tombstones(key, key_seqno, read_seqno)?
alt suppressed
Tree-->>Client: None
else not suppressed
Tree->>SealedMemtables: read candidate / fallback to SST
Tree-->>Client: Value
end
ActiveMemtable->>SealedMemtables: seal/flush
Tree->>Flusher: flush_to_tables_with_rt(stream, collected_tombstones)
Flusher->>SST: write SST(s) (include RT block or RT-only table)
SST->>Tree: recovered tables include RTs
sequenceDiagram
participant Iter
participant Filter
participant ActiveF
participant ActiveR
participant Inner
Iter->>Filter: next()
Filter->>Inner: pull next item (key, seqno)
Filter->>ActiveF: activate RTs with start <= key
Filter->>ActiveF: expire RTs with end <= key
Filter->>ActiveF: is_suppressed(key_seqno)?
alt suppressed
Filter->>Inner: skip, loop
else not suppressed
Filter-->>Iter: yield item
end
Iter->>Filter: next_back()
Filter->>Inner: pull prev item (key, seqno)
Filter->>ActiveR: activate RTs with end > key
Filter->>ActiveR: expire RTs with start > key
Filter->>ActiveR: is_suppressed(key_seqno)?
alt suppressed
Filter->>Inner: skip, loop
else not suppressed
Filter-->>Iter: yield item
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Adds first-class range tombstones to the LSM-tree to enable efficient range/prefix deletes (remove_range, remove_prefix) with MVCC-aware visibility, and persists tombstones through flush/compaction/recovery via a new SST block and table metadata.
Changes:
- Introduces core range-tombstone types and in-memory structures (interval tree in memtables; sweep-line active sets + iterator filter).
- Extends flush/compaction/table IO to write/read range tombstones from SSTs (
BlockType::RangeTombstone, TOC region, metadata count). - Adds integration tests validating suppression semantics and persistence across flush/compaction/recovery.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/range_tombstone.rs | Integration tests for range tombstone behavior and persistence. |
| src/abstract_tree.rs | Adds remove_range API and remove_prefix helper; extends flush path to pass tombstones. |
| src/tree/mod.rs | Tree integration: point-read suppression check + flush now accepts tombstones. |
| src/blob_tree/mod.rs | BlobTree integration: passes tombstones through flush; exposes remove_range. |
| src/memtable/mod.rs | Stores memtable range tombstones in a mutex-protected interval tree and exposes query/flush helpers. |
| src/memtable/interval_tree.rs | AVL interval tree backing for memtable tombstone queries. |
| src/range_tombstone.rs | Defines RangeTombstone, clipping/cover helpers, and key upper-bound helper. |
| src/active_tombstone_set.rs | Sweep-line “active tombstone” trackers for forward/reverse iteration. |
| src/range_tombstone_filter.rs | Bidirectional iterator wrapper to suppress entries covered by visible range tombstones. |
| src/range.rs | Collects tombstones across layers, adds table-skip optimization, and wraps iterators with tombstone filter. |
| src/table/block/type.rs | Adds BlockType::RangeTombstone = 4. |
| src/table/writer/mod.rs | Writes range tombstone blocks + metadata; supports RT-only tables by deriving key range. |
| src/table/regions.rs | Adds range_tombstones section handle in parsed table regions. |
| src/table/multi_writer.rs | Distributes/clips tombstones per output table via set_range_tombstones. |
| src/table/inner.rs | Stores loaded range tombstones in table inner state. |
| src/table/mod.rs | Loads/decodes range tombstone block on table recovery; exposes range_tombstones(). |
| src/compaction/flavour.rs | Extends compaction flavour to accept output range tombstones. |
| src/compaction/worker.rs | Collects tombstones from input tables and propagates them to compaction outputs (with last-level GC filtering). |
| src/lib.rs | Registers new internal modules for range tombstone support. |
| .forge/16/known-gaps.md | Documents remaining limitations and known gaps for Issue #16. |
| .forge/16/analysis.md | Architecture/design analysis and rationale for the chosen approach. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (2)
src/table/mod.rs (2)
570-577: Consider validating the block type for range tombstones.The recovery path loads the range tombstone block with
CompressionType::Nonebut doesn't validate that the loaded block hasBlockType::RangeTombstone. This differs from other block loading paths (e.g., filter block at lines 553-560) that verify the block type.♻️ Proposed fix
let range_tombstones = if let Some(rt_handle) = regions.range_tombstones { log::trace!("Loading range tombstone block, with rt_ptr={rt_handle:?}"); let block = Block::from_file(&file, rt_handle, crate::CompressionType::None)?; + if block.header.block_type != BlockType::RangeTombstone { + return Err(crate::Error::InvalidTag(( + "BlockType", + block.header.block_type.into(), + ))); + } Self::decode_range_tombstones(block)? } else { Vec::new() };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/table/mod.rs` around lines 570 - 577, The range tombstone loading path doesn't validate the loaded block type; when regions.range_tombstones is Some you call Block::from_file(..., crate::CompressionType::None) and then Self::decode_range_tombstones(block) without checking block.block_type. Update the code to verify the loaded block's type equals BlockType::RangeTombstone (or return an error if not) before calling Self::decode_range_tombstones; reference the regions.range_tombstones branch, the Block::from_file call, crate::CompressionType::None, and Self::decode_range_tombstones to locate and modify the logic.
620-660: Consider validating tombstone invariantstart < endduring decoding.
RangeTombstone::newhas adebug_assert!(start < end)that only fires in debug builds. Corrupted on-disk data could produce invalid tombstones withstart >= endin release builds, potentially causing incorrect behavior in the interval tree or filtering logic.🛡️ Proposed validation
tombstones.push(RangeTombstone::new( UserKey::from(start_buf), UserKey::from(end_buf), seqno, )); + // Validate tombstone invariant + if let Some(rt) = tombstones.last() { + if rt.start >= rt.end { + return Err(crate::Error::Unrecoverable); + } + }Or validate before constructing:
+ let start_key = UserKey::from(start_buf); + let end_key = UserKey::from(end_buf); + if start_key >= end_key { + return Err(crate::Error::Unrecoverable); + } tombstones.push(RangeTombstone::new( - UserKey::from(start_buf), - UserKey::from(end_buf), + start_key, + end_key, seqno, ));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/table/mod.rs` around lines 620 - 660, The decoder (decode_range_tombstones) must validate the invariant start < end after reading start_buf and end_buf and before calling RangeTombstone::new; if the condition fails, return an appropriate crate::Error (e.g., a corruption/unrecoverable error) instead of constructing the tombstone so release builds don't accept invalid on-disk data; update the logic around UserKey::from(start_buf)/UserKey::from(end_buf) to compare the keys and early-return Err(...) when start >= end.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.forge/16/analysis.md:
- Around line 85-95: MD040 is triggered because the two fenced diff-stat blocks
in .forge/16/analysis.md lack a language tag; update the opening fences for the
two blocks that start with "origin/main vs upstream/main:" and "origin/main vs
upstream-pr-242:" to use a language tag (e.g., change ``` to ```text) so the
blocks become ```text; keep the content of the blocks unchanged.
In `@src/abstract_tree.rs`:
- Around line 722-743: The current remove_prefix implementation incorrectly
treats Bound::Unbounded by synthesizing a truncated exclusive end via
range_tombstone::upper_bound_exclusive(prefix), which doesn't form a true
lexicographic upper bound (e.g., b"\xff" still leaves b"\xff\x01"); fix by
adding an explicit unbounded-path instead of synthesizing an end: when
prefix_to_range yields hi == Bound::Unbounded, call a true unbounded-range
removal API (e.g., implement/use self.remove_range_unbounded(start, seqno) or
self.remove_range_from(start, seqno)) to remove [start, +∞), or if an unbounded
removal is not supported, explicitly reject the operation (return 0 or propagate
an error) rather than creating a truncated tombstone; update remove_prefix, and
add/modify remove_range* helper(s) accordingly (referencing remove_prefix,
prefix_to_range, range_tombstone::upper_bound_exclusive, and remove_range).
In `@src/active_tombstone_set.rs`:
- Around line 110-123: The initialize_from method currently activates every
provided RangeTombstone via activate(&rt) even if it starts in the future
relative to the iterator's current_key; change initialize_from to only activate
truly overlapping tombstones by either (A) adding a current_key parameter (e.g.,
initialize_from(&mut self, current_key: &KeyType, tombstones: impl
IntoIterator<Item = RangeTombstone>)) and filtering each rt with rt.start <=
*current_key && *current_key < rt.end before calling activate, or (B) make
initialize_from private/internal and require callers to pre-filter by that same
overlap predicate; apply the same fix to the analogous method/block at the
second occurrence (the code referenced around lines 226–230) and ensure
expire_until usage still enforces the end bound after activation.
In `@src/memtable/interval_tree.rs`:
- Around line 266-269: The insert method incorrectly increments self.len
unconditionally; change insert_node (the helper called by insert) to return
whether it actually created a new node (e.g., return a (node, bool) or a Node
plus created flag) when handling RangeTombstone duplicates, then update insert
to set self.root from the returned node and only increment self.len when the
created flag is true; update any calling sites or signatures for insert_node and
keep references to RangeTombstone, insert_node, and insert consistent.
In `@src/memtable/mod.rs`:
- Around line 31-36: Memtables that only contain range tombstones are treated as
empty because Memtable::is_empty() only checks items; update logic so
Memtable::is_empty() also inspects range_tombstones by locking the Mutex and
calling the interval tree's is_empty (e.g., check
self.range_tombstones.lock().is_empty()), and propagate this change to callers
(Tree::rotate_memtable(), Tree::clear_active_memtable()) so they seal/flush
memtables and compute get_highest_seqno() correctly when only tombstones exist;
ensure get_highest_seqno() and any durability paths (remove_range/remove_prefix)
treat a memtable with non-empty range_tombstones as non-empty.
- Around line 183-190: The insert_range_tombstone method must reject empty or
reversed intervals before persisting: in insert_range_tombstone (and before
calling RangeTombstone::new or inserting into self.range_tombstones), validate
that start < end (using UserKey ordering) and if not, return 0 (do not
compute/record the tombstone). Move or gate the size calculation and the
lock+insert behind this check so reversed/empty ranges are never stored; keep
the rest of the method behavior unchanged.
In `@src/range_tombstone_filter.rs`:
- Around line 39-52: The constructor new() should ensure fwd_tombstones are
ordered by start ascending (and tie-break by seqno descending) before storing,
because fwd_activate_up_to() relies on forward tombstones being sorted by start;
add a sort on fwd_tombstones in new() such as sorting by (start asc, seqno desc)
prior to assigning Self.fwd_tombstones so out-of-order input Vecs can't skip
activation.
In `@tests/range_tombstone.rs`:
- Around line 330-352: Add a new test that writes a range tombstone then tampers
with the SST on-disk range_tombstones block and asserts recovery fails: use
get_tmp_folder(), open_tree(), remove_range(), and flush_active_memtable(0) to
create an SST with the new range tombstone format
([start_len][start][end_len][end][seqno]), then locate the produced SST file in
the folder, corrupt/truncate bytes from the range_tombstones payload (e.g.
remove bytes from the end of the block or zero out length fields), attempt to
reopen the tree with open_tree(folder.path()) and assert that reopening returns
an Err (i.e. recovery fails) rather than succeeding; keep the test focused on
both truncated and malformed length scenarios so it fails on invalid RT
encoding.
- Around line 168-176: The test currently relies on behavior that could pass
even if the tombstone never left the active memtable; modify the test around the
remove_range/rotate_memtable sequence (the block using tree.remove_range,
tree.rotate_memtable, tree.insert, and tree.get) to explicitly assert that the
tombstone was actually sealed/flushed by checking rotate_memtable() returned
Some (or that tree.sealed_memtable_count() > 0) immediately after rotation, and
similarly for the later analogous block (lines ~367-374) assert table_count()
increases or reopen the DB to force RT-only flush visibility; update the
assertions in both places so the test fails if the tombstone stayed in the
active memtable.
---
Nitpick comments:
In `@src/table/mod.rs`:
- Around line 570-577: The range tombstone loading path doesn't validate the
loaded block type; when regions.range_tombstones is Some you call
Block::from_file(..., crate::CompressionType::None) and then
Self::decode_range_tombstones(block) without checking block.block_type. Update
the code to verify the loaded block's type equals BlockType::RangeTombstone (or
return an error if not) before calling Self::decode_range_tombstones; reference
the regions.range_tombstones branch, the Block::from_file call,
crate::CompressionType::None, and Self::decode_range_tombstones to locate and
modify the logic.
- Around line 620-660: The decoder (decode_range_tombstones) must validate the
invariant start < end after reading start_buf and end_buf and before calling
RangeTombstone::new; if the condition fails, return an appropriate crate::Error
(e.g., a corruption/unrecoverable error) instead of constructing the tombstone
so release builds don't accept invalid on-disk data; update the logic around
UserKey::from(start_buf)/UserKey::from(end_buf) to compare the keys and
early-return Err(...) when start >= end.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0a4b651a-d997-4ee5-9e1a-ba1e0f088648
📒 Files selected for processing (21)
.forge/16/analysis.md.forge/16/known-gaps.mdsrc/abstract_tree.rssrc/active_tombstone_set.rssrc/blob_tree/mod.rssrc/compaction/flavour.rssrc/compaction/worker.rssrc/lib.rssrc/memtable/interval_tree.rssrc/memtable/mod.rssrc/range.rssrc/range_tombstone.rssrc/range_tombstone_filter.rssrc/table/block/type.rssrc/table/inner.rssrc/table/mod.rssrc/table/multi_writer.rssrc/table/regions.rssrc/table/writer/mod.rssrc/tree/mod.rstests/range_tombstone.rs
- Fix non-exhaustive BlockType match in table/util.rs (metrics feature) - Add #[allow(dead_code)] for future-use API (CoveringRt, initialize_from) - Add #[allow(clippy::unnecessary_box_returns)] for AVL tree functions - Fix Option<&T> vs &Option<T> in interval_tree traversal functions - Add missing # Panics and # Errors doc sections - Add backticks in doc comments for identifiers - Use let...else patterns where clippy suggests - Fix redundant clone, pass-by-value, pub(crate) visibility - Remove unfulfilled lint expectations (cfg-gated params)
- Validate start < end on SST decode (reject corrupted RT blocks) - Validate start >= end in insert_range_tombstone (release-safe guard) - Fix duplicate insert len accounting in IntervalTree - Fix 0xFF prefix edge case in remove_prefix (return 0 for unrepresentable) - Re-insert tombstones into active memtable when RT-only flush produces no valid SST (no index possible without KV items) - Account for range tombstones in Memtable::is_empty - Add cheap pre-check before collecting range tombstones in iteration - Sort fwd_tombstones inside RangeTombstoneFilter::new for safety - Harden test assertions (sealed count, table count)
Verify loaded block has BlockType::RangeTombstone before decoding, consistent with filter block validation pattern.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/table/iter.rs (1)
122-122: Prefer#[expect(clippy::too_many_arguments)]over#[allow(...)].Per coding guidelines,
#[expect]should be used instead of#[allow]because it warns when the suppression becomes unnecessary (e.g., if future refactoring reduces the argument count).♻️ Suggested fix
- #[allow(clippy::too_many_arguments)] + #[expect(clippy::too_many_arguments)]As per coding guidelines: "Prefer
#[expect(lint)]over#[allow(lint)]—#[expect]warns when suppression becomes unnecessary"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/table/iter.rs` at line 122, Replace the #[allow(clippy::too_many_arguments)] attribute on the function/impl annotated in this diff with #[expect(clippy::too_many_arguments)] so the suppression still applies but will warn if it becomes unnecessary; locate the exact attribute instance (the attribute decorating the function that currently uses #[allow(clippy::too_many_arguments)]) and change the attribute name only, leaving the lint and surrounding code unchanged.src/abstract_tree.rs (1)
239-245: Consider#[expect]instead of#[warn]for lint attribute.The
#[warn(clippy::type_complexity)]attribute is unusual here — it would emit warnings rather than suppress them. If the intent is to suppress the lint, use#[expect(clippy::type_complexity)]or#[allow(clippy::type_complexity)].🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/abstract_tree.rs` around lines 239 - 245, The attribute on function flush_to_tables uses #[warn(clippy::type_complexity)] which will emit warnings instead of suppressing them; change it to #[expect(clippy::type_complexity)] (or #[allow(clippy::type_complexity)] if you prefer permanent suppression) placed on the flush_to_tables function declaration so the clippy type_complexity lint is treated as expected/allowed for the flush_to_tables / flush_to_tables_with_rt code path.src/range.rs (1)
101-101: Prefer#[expect(clippy::too_many_lines)]over#[allow(...)].Per coding guidelines, use
#[expect]so you get a warning when the suppression becomes unnecessary.♻️ Suggested fix
- #[allow(clippy::too_many_lines)] + #[expect(clippy::too_many_lines)]As per coding guidelines: "Prefer
#[expect(lint)]over#[allow(lint)]—#[expect]warns when suppression becomes unnecessary"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/range.rs` at line 101, Replace the attribute #[allow(clippy::too_many_lines)] with #[expect(clippy::too_many_lines)] to follow the coding guideline; locate the attribute (attached to the function/module declared near the top of src/range.rs) and change the lint suppression to an expectation so you receive a warning if the suppression becomes unnecessary.src/tree/mod.rs (1)
778-816: Performance concern: Linear scan of all SST tables for range tombstone suppression.The current implementation iterates over every table in every level (lines 802-813) to check range tombstones. For point reads, this could be expensive with many tables.
Consider filtering tables by key range before checking their tombstones, similar to how
get_for_keyfilters tables inget_internal_entry_from_tables. The PR notes this as a known limitation ("linear SST scan for point-read suppression").💡 Potential optimization sketch
- for table in super_version - .version - .iter_levels() - .flat_map(|lvl| lvl.iter()) - .flat_map(|run| run.iter()) - { + // Only check tables whose key range contains the key + for table in super_version + .version + .iter_levels() + .flat_map(|lvl| lvl.iter()) + .filter_map(|run| run.get_for_key(key)) + { for rt in table.range_tombstones() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tree/mod.rs` around lines 778 - 816, The is_suppressed_by_range_tombstones function currently scans every SST table; limit that by first filtering tables whose key range could contain the target key (like get_for_key / get_internal_entry_from_tables does) before iterating range_tombstones(): for each table from super_version.version.iter_levels().flat_map(...).flat_map(...), check the table's key range (e.g., smallest/largest key or a key_range() accessor on the Table) and skip tables where key is outside that range, then only call table.range_tombstones() and rt.should_suppress; keep the existing active_memtable/sealed_memtables checks and ensure you use the same table-range helpers used elsewhere to avoid duplicating range logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/table/util.rs`:
- Line 31: The attribute #[allow(clippy::too_many_arguments)] in
src/table/util.rs should be changed to #[expect(clippy::too_many_arguments)] to
comply with the repo's policy; locate the function or item annotated with
#[allow(clippy::too_many_arguments)] (the attribute on the item in util.rs) and
replace the allow attribute with #[expect(clippy::too_many_arguments)]
preserving any existing whitespace/comments and ensuring the attribute exactly
matches the expect form.
---
Nitpick comments:
In `@src/abstract_tree.rs`:
- Around line 239-245: The attribute on function flush_to_tables uses
#[warn(clippy::type_complexity)] which will emit warnings instead of suppressing
them; change it to #[expect(clippy::type_complexity)] (or
#[allow(clippy::type_complexity)] if you prefer permanent suppression) placed on
the flush_to_tables function declaration so the clippy type_complexity lint is
treated as expected/allowed for the flush_to_tables / flush_to_tables_with_rt
code path.
In `@src/range.rs`:
- Line 101: Replace the attribute #[allow(clippy::too_many_lines)] with
#[expect(clippy::too_many_lines)] to follow the coding guideline; locate the
attribute (attached to the function/module declared near the top of
src/range.rs) and change the lint suppression to an expectation so you receive a
warning if the suppression becomes unnecessary.
In `@src/table/iter.rs`:
- Line 122: Replace the #[allow(clippy::too_many_arguments)] attribute on the
function/impl annotated in this diff with #[expect(clippy::too_many_arguments)]
so the suppression still applies but will warn if it becomes unnecessary; locate
the exact attribute instance (the attribute decorating the function that
currently uses #[allow(clippy::too_many_arguments)]) and change the attribute
name only, leaving the lint and surrounding code unchanged.
In `@src/tree/mod.rs`:
- Around line 778-816: The is_suppressed_by_range_tombstones function currently
scans every SST table; limit that by first filtering tables whose key range
could contain the target key (like get_for_key / get_internal_entry_from_tables
does) before iterating range_tombstones(): for each table from
super_version.version.iter_levels().flat_map(...).flat_map(...), check the
table's key range (e.g., smallest/largest key or a key_range() accessor on the
Table) and skip tables where key is outside that range, then only call
table.range_tombstones() and rt.should_suppress; keep the existing
active_memtable/sealed_memtables checks and ensure you use the same table-range
helpers used elsewhere to avoid duplicating range logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b5037095-c6a0-47c3-a880-07c3d65eaa33
📒 Files selected for processing (14)
.forge/16/analysis.mdsrc/abstract_tree.rssrc/active_tombstone_set.rssrc/memtable/interval_tree.rssrc/memtable/mod.rssrc/range.rssrc/range_tombstone.rssrc/range_tombstone_filter.rssrc/table/iter.rssrc/table/mod.rssrc/table/util.rssrc/table/writer/mod.rssrc/tree/mod.rstests/range_tombstone.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/range_tombstone.rs
There was a problem hiding this comment.
Pull request overview
Adds MVCC-aware range tombstones to the LSM-tree, enabling efficient contiguous deletions via remove_range / remove_prefix, and persists/propagates tombstones through flush + compaction + recovery.
Changes:
- Introduces core range-tombstone data structures (
RangeTombstone, memtableIntervalTree, sweep-lineActiveTombstoneSet, andRangeTombstoneFilter). - Extends read/write paths to apply range-tombstone suppression (point reads, iterators, flush, compaction, and SST recovery).
- Adds integration + regression tests for correctness across memtables/SSTs/compaction/recovery.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/range_tombstone.rs | Integration tests covering range tombstone behavior across layers and lifecycle events. |
| src/abstract_tree.rs | Adds remove_range API and default remove_prefix; extends flush to pass RTs through. |
| src/tree/mod.rs | Implements remove_range; applies RT suppression on point reads; flush now supports RT distribution. |
| src/blob_tree/mod.rs | Forwards range deletes through BlobTree to underlying index tree; flush now supports RT distribution. |
| src/memtable/mod.rs | Stores RTs in a mutex-protected interval tree; updates emptiness + adds RT query/insert APIs. |
| src/memtable/interval_tree.rs | New AVL interval tree for efficient RT overlap/suppression queries in memtables. |
| src/active_tombstone_set.rs | New forward/reverse active RT tracking for sweep-line iteration filtering. |
| src/range_tombstone.rs | Defines the RT type, ordering, and helper utilities (intersection, coverage checks). |
| src/range_tombstone_filter.rs | New bidirectional iterator wrapper that suppresses KV entries covered by RTs. |
| src/range.rs | Collects RTs from all layers; wraps merged iterator with RangeTombstoneFilter; adds table-skip optimization. |
| src/table/block/type.rs | Adds new BlockType::RangeTombstone = 4. |
| src/table/regions.rs | Extends table TOC parsing to include optional range_tombstones region. |
| src/table/inner.rs | Stores recovered table RTs in Inner. |
| src/table/mod.rs | Loads/decodes RT blocks on table open; exposes range_tombstones() accessor. |
| src/table/writer/mod.rs | Adds RT collection and writes RT block + metadata count during table finish. |
| src/table/multi_writer.rs | Distributes/clips RTs across rotated output tables (flush/compaction output). |
| src/table/util.rs | Treats RT blocks like data/meta blocks for metrics counters; changes lint suppression. |
| src/table/iter.rs | Changes lint suppression for constructor with many args. |
| src/compaction/flavour.rs | Adds write_range_tombstones hook so compaction can propagate RTs to outputs. |
| src/compaction/worker.rs | Collects RTs from input tables and propagates them to compaction outputs (with GC at last level). |
| src/lib.rs | Exposes new internal modules for RT support. |
| .forge/16/known-gaps.md | Documents known limitations and optimization opportunities for RT feature. |
| .forge/16/analysis.md | Design/architecture analysis and upstream context for RT implementation choices. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/table/mod.rs (1)
648-666: Consider preserving error context for easier debugging.The
map_err(|_| crate::Error::Unrecoverable)discards the original I/O error. WhileUnrecoverableis semantically appropriate for corrupted data, preserving context (e.g., which field failed, byte offset) could aid debugging in production.💡 Optional: Add context to decode errors
If a more descriptive error variant exists or is added later (e.g.,
Error::RangeTombstoneCorrupted { offset, reason }), consider using it here to preserve the cursor position and failure reason.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/table/mod.rs` around lines 648 - 666, The current reads (cursor.read_u16::<LE>(), cursor.read_exact(...), cursor.read_u64::<LE>()) map all IO errors to crate::Error::Unrecoverable, discarding the original error and offset; update these map_err calls to capture and wrap the original error and context (e.g., which field: "start_len"/"start_buf"/"end_len"/"end_buf"/"seqno" and the cursor position/byte offset) into a more informative Error variant or into Unrecoverable with a message, so failures when reading start_len, start_buf, end_len, end_buf, and seqno preserve the underlying io::Error and an indication of the failing field/offset for easier debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/table/mod.rs`:
- Around line 671-675: Add a unit test named
decode_range_tombstones_invalid_interval_returns_error() that builds a
serialized range tombstone blob matching the production format (little-endian
lengths and seqno fields), intentionally sets the interval so start >= end,
calls decode_range_tombstones(...) and asserts it returns
Err(crate::Error::Unrecoverable); reference the decode_range_tombstones function
and use the same byteorder::<LE> encoding for length/seqno fields so the
deserializer exercises the start >= end validation path.
---
Nitpick comments:
In `@src/table/mod.rs`:
- Around line 648-666: The current reads (cursor.read_u16::<LE>(),
cursor.read_exact(...), cursor.read_u64::<LE>()) map all IO errors to
crate::Error::Unrecoverable, discarding the original error and offset; update
these map_err calls to capture and wrap the original error and context (e.g.,
which field: "start_len"/"start_buf"/"end_len"/"end_buf"/"seqno" and the cursor
position/byte offset) into a more informative Error variant or into
Unrecoverable with a message, so failures when reading start_len, start_buf,
end_len, end_buf, and seqno preserve the underlying io::Error and an indication
of the failing field/offset for easier debugging.
- Fix visible_at to use exclusive boundary (seqno < read_seqno), consistent with seqno_filter and Memtable::get conventions - Validate decoded field lengths against remaining block data before allocating buffers (prevent OOM on corrupted blocks) - Skip SST tables by key_range in point read RT suppression check - Add code comments explaining #[allow] vs #[expect] for cfg-gated params - Use #[expect] with reason for too_many_lines on create_range - Fix remove_range doc to match no-op behavior for invalid intervals - Update PR body to reflect RT-only flush semantics
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/table/util.rs (1)
31-32:⚠️ Potential issue | 🟡 MinorUse
cfg_attr(..., expect(...))instead of a permanent#[allow].The feature-gated parameter count is exactly what
cfg_attr(feature = "metrics", expect(...))is for. Leaving#[allow]here suppresses the lint in the one build where it should still fire.Suggested patch
-// NOTE: #[allow] not #[expect] because arg count changes with cfg(feature = "metrics") -#[allow(clippy::too_many_arguments)] +#[cfg_attr( + feature = "metrics", + expect( + clippy::too_many_arguments, + reason = "metrics adds the extra parameter; without that feature this stays at the lint threshold" + ) +)] pub fn load_block(As per coding guidelines, "Prefer
#[expect(lint)]over#[allow(lint)]—#[expect]warns when suppression becomes unnecessary."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/table/util.rs` around lines 31 - 32, Replace the permanent attribute #[allow(clippy::too_many_arguments)] with a conditional expect so the lint still fires when the metrics feature is disabled; specifically change the attribute to #[cfg_attr(feature = "metrics", expect(clippy::too_many_arguments))] (or the inverse depending on feature semantics) at the same item (the attribute applied in src/table/util.rs) so the suppression only applies under the cfg where extra arguments exist and the lint will warn in builds where it should.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/range.rs`:
- Around line 217-226: The table-cover check uses table.metadata.key_range.max()
which is an exclusive upper bound, but RangeTombstone::fully_covers() expects an
inclusive max; convert the exclusive upper bound to an inclusive one before
calling fully_covers (e.g., compute an inclusive_max from
table.metadata.key_range.max() using the appropriate predecessor/prev-key API or
method on the key type, handle empty ranges by skipping the check if min >
inclusive_max), and then call rt.fully_covers(table.metadata.key_range.min(),
inclusive_max) inside the all_range_tombstones.iter().any(...) closure while
keeping the visible_at and seqno comparison logic unchanged.
---
Duplicate comments:
In `@src/table/util.rs`:
- Around line 31-32: Replace the permanent attribute
#[allow(clippy::too_many_arguments)] with a conditional expect so the lint still
fires when the metrics feature is disabled; specifically change the attribute to
#[cfg_attr(feature = "metrics", expect(clippy::too_many_arguments))] (or the
inverse depending on feature semantics) at the same item (the attribute applied
in src/table/util.rs) so the suppression only applies under the cfg where extra
arguments exist and the lint will warn in builds where it should.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f4d3251b-ba68-46f2-8418-870e509cdcef
📒 Files selected for processing (7)
src/abstract_tree.rssrc/range.rssrc/range_tombstone.rssrc/table/iter.rssrc/table/mod.rssrc/table/util.rssrc/tree/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/table/iter.rs
Replace #[allow(clippy::too_many_arguments)] with #[cfg_attr(feature = "metrics", expect(...))] so the suppression only activates when the extra metrics parameter is compiled in.
There was a problem hiding this comment.
Pull request overview
This PR introduces MVCC-correct range tombstones to support efficient delete_range / delete_prefix semantics across memtables and SSTs, including persistence through flush/compaction/recovery and iterator-time suppression.
Changes:
- Add
remove_range/remove_prefixtoAbstractTree, with memtable storage via an AVL interval tree and iterator-time filtering via a sweep-lineRangeTombstoneFilter. - Persist range tombstones into SSTs via a new
BlockType::RangeTombstoneand load/validate them on table recovery. - Propagate/GC range tombstones during compaction and add an integration test suite for end-to-end semantics.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/range_tombstone.rs | Integration tests for point reads, iteration, flush/compaction/recovery, prefix deletes, and RT-only flush behavior |
| src/abstract_tree.rs | Adds public remove_range/remove_prefix API + flush plumbing for range tombstones |
| src/tree/mod.rs | Flush writes RTs to SSTs and point-read path checks suppression by RTs across layers |
| src/blob_tree/mod.rs | RT support for blob-tree flush path + forwards remove_range |
| src/memtable/mod.rs | Stores RTs in a mutex-protected interval tree; exposes query/sorted export helpers |
| src/memtable/interval_tree.rs | New AVL interval tree implementation for efficient overlap/suppression queries |
| src/range.rs | Wraps merged iterators with RangeTombstoneFilter and adds table-skip optimization |
| src/range_tombstone.rs | Defines RangeTombstone semantics, ordering, clipping, and helpers |
| src/range_tombstone_filter.rs | Bidirectional sweep-line iterator filter for RT suppression |
| src/active_tombstone_set.rs | Active set structures used by the iterator filter (forward + reverse) |
| src/table/block/type.rs | Adds new on-disk block tag RangeTombstone = 4 |
| src/table/writer/mod.rs | Writes an RT block into SSTs and records range_tombstone_count in metadata |
| src/table/regions.rs | TOC parsing extended to locate the range_tombstones section |
| src/table/inner.rs | Stores loaded RTs in table inner state |
| src/table/mod.rs | Loads/decodes RT blocks on recovery and exposes range_tombstones() |
| src/table/multi_writer.rs | Attempts to distribute/clamp RTs across rotated output tables |
| src/table/util.rs | Treats RT blocks like data/meta for metrics bookkeeping; lint suppression change |
| src/table/iter.rs | Lint suppression change for constructor arg count |
| src/compaction/worker.rs | Collects RTs from input tables and attempts to propagate/GC them |
| src/compaction/flavour.rs | Adds write_range_tombstones hook and wires it to the table writer |
| src/lib.rs | Exposes new internal modules for range tombstone support |
| .forge/16/known-gaps.md | Documents limitations/known gaps for the feature |
| .forge/16/analysis.md | Architecture analysis and design trade-offs documentation |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/range.rs (2)
302-307: Strengthen the fast path to skip wrapping when no tombstone is visible.A non-empty
all_range_tombstonescan still have zero entries visible atseqno, so this currently wraps with a no-op filter in those cases.⚡ Proposed patch
- // Fast path: skip filter wrapping when no range tombstones exist - if all_range_tombstones.is_empty() { + // Fast path: skip wrapping when no tombstone is visible for this read seqno + if !all_range_tombstones.iter().any(|rt| rt.visible_at(seqno)) { Box::new(iter) } else { Box::new(RangeTombstoneFilter::new(iter, all_range_tombstones, seqno)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/range.rs` around lines 302 - 307, The fast-path should skip wrapping not only when all_range_tombstones.is_empty() but also when none of the tombstones are visible at the given seqno; update the conditional around Box::new(...) to inspect all_range_tombstones for any entry visible at seqno (e.g. using the visibility predicate/method used by RangeTombstoneFilter or by filtering/map-checking the collection) and only construct RangeTombstoneFilter::new(iter, all_range_tombstones, seqno) if at least one tombstone is visible; otherwise return Box::new(iter) unchanged.
176-196: Remove redundant tombstone sorting in this path.
rts.sort()here is duplicate work:RangeTombstoneFilter::newalready sorts the forward list (and builds a reverse-sorted copy). This adds avoidableO(n log n)cost per range open.♻️ Proposed patch
let all_range_tombstones = if rt_count > 0 { let mut rts: Vec<RangeTombstone> = Vec::with_capacity(rt_count); rts.extend(lock.version.active_memtable.range_tombstones_sorted()); for mt in lock.version.sealed_memtables.iter() { rts.extend(mt.range_tombstones_sorted()); } if let Some((mt, _)) = &lock.ephemeral { rts.extend(mt.range_tombstones_sorted()); } for table in lock .version .version .iter_levels() .flat_map(|lvl| lvl.iter()) .flat_map(|run| run.iter()) { rts.extend(table.range_tombstones().iter().cloned()); } - rts.sort(); rts } else { Vec::new() };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/range.rs` around lines 176 - 196, The vector `rts` constructed for `all_range_tombstones` is being sorted with `rts.sort()` redundantly; `RangeTombstoneFilter::new` already sorts the forward list and produces the reverse-sorted copy, so remove the `rts.sort()` call in the block that builds `rts` (the code that extends from `lock.version.active_memtable.range_tombstones_sorted()`, sealed memtables, `lock.ephemeral`, and version runs) and leave the unsorted `rts` to be passed into `RangeTombstoneFilter::new`; this avoids the extra O(n log n) work while preserving correctness for `RangeTombstone` handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/range.rs`:
- Around line 302-307: The fast-path should skip wrapping not only when
all_range_tombstones.is_empty() but also when none of the tombstones are visible
at the given seqno; update the conditional around Box::new(...) to inspect
all_range_tombstones for any entry visible at seqno (e.g. using the visibility
predicate/method used by RangeTombstoneFilter or by filtering/map-checking the
collection) and only construct RangeTombstoneFilter::new(iter,
all_range_tombstones, seqno) if at least one tombstone is visible; otherwise
return Box::new(iter) unchanged.
- Around line 176-196: The vector `rts` constructed for `all_range_tombstones`
is being sorted with `rts.sort()` redundantly; `RangeTombstoneFilter::new`
already sorts the forward list and produces the reverse-sorted copy, so remove
the `rts.sort()` call in the block that builds `rts` (the code that extends from
`lock.version.active_memtable.range_tombstones_sorted()`, sealed memtables,
`lock.ephemeral`, and version runs) and leave the unsorted `rts` to be passed
into `RangeTombstoneFilter::new`; this avoids the extra O(n log n) work while
preserving correctness for `RangeTombstone` handling.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 36192350-8de8-4946-a95f-d0e3da61329a
📒 Files selected for processing (3)
src/range.rssrc/table/iter.rssrc/table/util.rs
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…unds - Set range tombstones on MultiWriter before KV write loop in flush (Tree, BlobTree) so rotations carry RT metadata in earlier tables - Propagate RTs before merge_iter loop in compaction worker - Assert u16 bound on start/end key lengths in insert_range_tombstone - Use exclusive seqno boundary in visible_at doc strings - Strengthen iteration fast path: check visible_at not just is_empty - Remove redundant sort (RangeTombstoneFilter::new sorts internally)
- Compaction with MultiWriter rotation preserves RTs across tables - Table-skip optimization skips fully-covered tables - BlobTree range tombstone insert, suppress, flush - Invalid interval (start >= end) silently returns 0 - Multiple compaction rounds preserve range tombstones
There was a problem hiding this comment.
Pull request overview
Adds MVCC-aware range tombstones to enable efficient range/prefix deletion (remove_range / remove_prefix) with persistence across flush/compaction/recovery, plus iterator- and point-read suppression.
Changes:
- Introduces core range-tombstone data structures (range tombstone model, sweep-line active sets, memtable interval tree, iterator filter).
- Persists range tombstones in SSTs via a new
BlockType::RangeTombstoneand TOC section, propagating them through compaction. - Updates read path (point reads + range/prefix iteration) to suppress covered entries and adds comprehensive tests/docs.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
tests/range_tombstone.rs |
New integration test suite for RT semantics across memtable/SST/compaction/recovery. |
src/abstract_tree.rs |
Adds remove_range API + remove_prefix helper; flush now forwards RTs to table writers. |
src/blob_tree/mod.rs |
Wires RT-aware flush through blob tree and forwards remove_range. |
src/tree/mod.rs |
Implements remove_range for Tree, RT-aware flush-to-tables, and point-read suppression checks. |
src/range.rs |
Collects RTs across layers for range/prefix iteration; adds RT filtering and table-skip optimization. |
src/memtable/mod.rs |
Stores memtable RTs (mutex-protected interval tree) and exposes query/flush helpers. |
src/memtable/interval_tree.rs |
New AVL interval tree for RT overlap and suppression queries in memtables. |
src/active_tombstone_set.rs |
New forward/reverse sweep-line active tombstone sets for iteration. |
src/range_tombstone.rs |
Defines RangeTombstone and helpers (visibility, suppression, clipping, coverage). |
src/range_tombstone_filter.rs |
New bidirectional iterator wrapper that suppresses keys covered by visible RTs. |
src/table/block/type.rs |
Adds BlockType::RangeTombstone = 4 to the SST block type enum. |
src/table/writer/mod.rs |
Writes RT blocks + count metadata during SST creation; updates writer state to track RTs. |
src/table/regions.rs |
Parses new "range_tombstones" TOC section into table regions. |
src/table/mod.rs |
Loads/decodes RT blocks on table recovery and exposes Table::range_tombstones(). |
src/table/inner.rs |
Stores recovered table RTs in Table inner state. |
src/table/multi_writer.rs |
Distributes/clips RTs across rotated output tables during flush/compaction. |
src/table/util.rs |
Includes RT blocks in metrics accounting for block loads (cache + IO). |
src/table/iter.rs |
Adjusts clippy lint gating for metrics-conditional argument count. |
src/compaction/worker.rs |
Collects RTs from compaction inputs and propagates them to outputs (with GC eviction at last level). |
src/compaction/flavour.rs |
Extends compaction flavour interface to pass RTs into table writers. |
src/lib.rs |
Exposes new internal modules for RT support. |
.forge/16/known-gaps.md |
Documents known limitations/optimization gaps for Issue #16 implementation. |
.forge/16/analysis.md |
Architecture/design analysis and upstream context for range tombstones. |
You can also share your feedback on Copilot code review. Take the survey.
- Write synthetic weak tombstone in RT-only tables to produce valid index - Fix interval_tree pruning: use >= for exclusive seqno boundary - Replace #[allow(dead_code)] with #[expect(dead_code, reason)] throughout - Remove key_range filter from SST RT suppression (RT range may exceed table key range)
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
.github/copilot-instructions.md (1)
65-74:⚠️ Potential issue | 🟠 MajorNarrow this rule to PR-visible call sites and preserve unconditional Tier 1/2 reporting.
Line 69 (“check every call site”) conflicts with Line 9 diff-only review scope, and Line 73’s “not a bug” wording is too absolute. Please scope this to call sites visible in the PR diff and explicitly keep Tier 1/Tier 2 findings reportable even when a local protocol exists.
Suggested wording tweak
- **Before flagging any potential issue, trace the FULL call chain:** + **Before flagging any potential issue, trace the call chain visible in the PR diff:** ... - **Only flag an issue if it survives all three checks.** A finding that looks like a bug in one function but is handled by its caller is not a bug — it is a protocol between the two functions. + **Only suppress Tier 3/Tier 4 findings if they fail these checks.** If the pattern still indicates Tier 1 (correctness) or Tier 2 (safety/crash-recovery) risk, flag it even when caller/callee protocols are documented.Based on learnings: “Review ONLY code within the PR's diff” and “Tier 1/Tier 2 issues must be flagged regardless of documentation.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/copilot-instructions.md around lines 65 - 74, Update the "Design Decision Analysis (CRITICAL)" guidance to limit the "check every call site" requirement to only call sites that are visible within the PR diff and to explicitly preserve unconditional reporting for Tier 1 and Tier 2 findings; edit the sentences that mention "check every call site" and "Only flag an issue if it survives all three checks" so they read that reviewers should trace the full call chain only among PR-visible call sites and that documented local protocols do not suppress Tier 1/Tier 2 reports.
🧹 Nitpick comments (1)
src/range_tombstone_filter.rs (1)
151-261: Rename the new unit tests to the repo’s standard pattern.Several of these names (
no_tombstones,basic_suppression,half_open_end_exclusive,reverse_half_open, …) are outside the repository’swhat_condition_expectedconvention, which makes the new test surface less consistent than the rest of the crate.As per coding guidelines, "Use test naming convention:
fn <what>_<condition>_<expected>()."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/range_tombstone_filter.rs` around lines 151 - 261, Rename the tests to follow the <what>_<condition>_<expected> pattern: no_tombstones -> items_no_tombstones_return_all, basic_suppression -> items_with_range_tombstone_suppress_intermediate_keys, tombstone_does_not_suppress_newer_kv -> tombstone_older_than_kv_does_not_suppress, half_open_end_exclusive -> range_end_exclusive_suppresses_middle_keys, multiple_overlapping_tombstones -> overlapping_tombstones_keep_newer_kv, tombstone_not_visible_at_read_seqno -> tombstone_newer_than_read_seqno_not_visible, reverse_basic_suppression -> rev_items_with_range_tombstone_suppress_middle_keys, reverse_half_open -> rev_range_end_exclusive_suppresses_between; update the fn identifiers accordingly (e.g., fn items_no_tombstones_return_all()) and keep the test bodies unchanged so references to RangeTombstoneFilter, rt, kv, SeqNo::MAX remain valid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/range_tombstone.rs`:
- Around line 193-196: The module-level clippy suppression currently only lists
clippy::unwrap_used (the attribute on the module near the top of
src/range_tombstone.rs); add clippy::expect_used to that same attribute so calls
to expect(...) (e.g., the expect used around line ~385) are also silenced during
clippy runs. Update the existing #[expect(...)] attribute to include both
clippy::unwrap_used and clippy::expect_used.
In `@tests/range_tombstone.rs`:
- Around line 877-902: The test
compaction_clip_preserves_rt_covering_gap_between_tables currently uses
major_compact(64_000_000, 0) with tiny inputs so it compacts into a single
output table and never exercises the described gap case; either (A)
rename/reword the test and its comments to state it is a major-compaction
regression (remove claims about a gap between output tables) referencing the
test fn compaction_clip_preserves_rt_covering_gap_between_tables and calls to
major_compact/remove_range/flush_active_memtable, or (B) change the test inputs
to force multiple output tables (e.g., insert more keys or lower the compaction
target so major_compact produces two outputs, or invoke the leveled compaction
path if available) so the RT [m,p) actually falls in a gap between output tables
and the assertions exercise the intended case.
---
Duplicate comments:
In @.github/copilot-instructions.md:
- Around line 65-74: Update the "Design Decision Analysis (CRITICAL)" guidance
to limit the "check every call site" requirement to only call sites that are
visible within the PR diff and to explicitly preserve unconditional reporting
for Tier 1 and Tier 2 findings; edit the sentences that mention "check every
call site" and "Only flag an issue if it survives all three checks" so they read
that reviewers should trace the full call chain only among PR-visible call sites
and that documented local protocols do not suppress Tier 1/Tier 2 reports.
---
Nitpick comments:
In `@src/range_tombstone_filter.rs`:
- Around line 151-261: Rename the tests to follow the
<what>_<condition>_<expected> pattern: no_tombstones ->
items_no_tombstones_return_all, basic_suppression ->
items_with_range_tombstone_suppress_intermediate_keys,
tombstone_does_not_suppress_newer_kv ->
tombstone_older_than_kv_does_not_suppress, half_open_end_exclusive ->
range_end_exclusive_suppresses_middle_keys, multiple_overlapping_tombstones ->
overlapping_tombstones_keep_newer_kv, tombstone_not_visible_at_read_seqno ->
tombstone_newer_than_read_seqno_not_visible, reverse_basic_suppression ->
rev_items_with_range_tombstone_suppress_middle_keys, reverse_half_open ->
rev_range_end_exclusive_suppresses_between; update the fn identifiers
accordingly (e.g., fn items_no_tombstones_return_all()) and keep the test bodies
unchanged so references to RangeTombstoneFilter, rt, kv, SeqNo::MAX remain
valid.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 31c0a672-6e79-4242-ad4e-bc89cd0d3d5b
📒 Files selected for processing (28)
.github/copilot-instructions.md.github/instructions/rust.instructions.mdREADME.mdsrc/abstract_tree.rssrc/active_tombstone_set.rssrc/any_tree.rssrc/blob_tree/mod.rssrc/compaction/flavour.rssrc/compaction/worker.rssrc/format_version.rssrc/lib.rssrc/memtable/interval_tree.rssrc/memtable/mod.rssrc/range.rssrc/range_tombstone.rssrc/range_tombstone_filter.rssrc/table/block/type.rssrc/table/inner.rssrc/table/iter.rssrc/table/mod.rssrc/table/multi_writer.rssrc/table/regions.rssrc/table/util.rssrc/table/writer/mod.rssrc/tree/mod.rssrc/version/mod.rstests/range_tombstone.rstests/tree_recovery_versions.rs
…TODO, ignore heavy test - Add clippy::expect_used to range_tombstone test module expect list - Add range_tombstones section to ASCII table layout diagram - Annotate RT block metrics folded into data_block counters - Mark 68 MiB multi-table flush test as #[ignore] - Assert InvalidVersion error type in manifest version test
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.github/copilot-instructions.md (1)
73-73: Clarify suppression wording to avoid reviewer misinterpretationThe sentence can be read both ways. Consider rewording to explicitly say Tier 3/4 findings may be suppressed only when they do not survive the 3 checks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/copilot-instructions.md at line 73, The sentence "Only suppress Tier 3/Tier 4 findings if they fail these checks." is ambiguous; update the wording in .github/copilot-instructions.md to explicitly state that Tier 3/Tier 4 findings may be suppressed only when they do not survive the three checks (e.g., "Only suppress Tier 3/Tier 4 findings if they do not survive the three checks below: ..."), making clear that any finding that still indicates Tier 1 or Tier 2 must be flagged regardless of suppression rationale.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/range_tombstone.rs`:
- Around line 396-409: The test currently calls tree.major_compact(64_000_000,
0)? but never asserts that compaction actually ran; immediately after calling
tree.major_compact(...) add a structural assertion that the number of SSTables
collapsed (e.g., assert_eq!(1, tree.table_count());) so the test fails if
compaction short-circuits — reference the tree.major_compact and
tree.table_count symbols when making the change.
- Around line 641-652: After calling tree.major_compact(1024, 0) add an
assertion that the compaction produced multiple output tables to ensure the
MultiWriter rotation path was exercised; e.g. immediately after the compact call
assert that the number of output tables is >= 2 by querying the tree's API
(replace with the real accessor if different) such as
assert!(tree.out_tables().len() >= 2) or assert!(tree.num_tables() >= 2) (or use
tree.manifest().live_tables().len() / appropriate method) before the existing
tree.get(...) checks.
---
Nitpick comments:
In @.github/copilot-instructions.md:
- Line 73: The sentence "Only suppress Tier 3/Tier 4 findings if they fail these
checks." is ambiguous; update the wording in .github/copilot-instructions.md to
explicitly state that Tier 3/Tier 4 findings may be suppressed only when they do
not survive the three checks (e.g., "Only suppress Tier 3/Tier 4 findings if
they do not survive the three checks below: ..."), making clear that any finding
that still indicates Tier 1 or Tier 2 must be flagged regardless of suppression
rationale.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6c6916bd-814f-424b-8d8f-082c1a50a5a8
📒 Files selected for processing (28)
.github/copilot-instructions.md.github/instructions/rust.instructions.mdREADME.mdsrc/abstract_tree.rssrc/active_tombstone_set.rssrc/any_tree.rssrc/blob_tree/mod.rssrc/compaction/flavour.rssrc/compaction/worker.rssrc/format_version.rssrc/lib.rssrc/memtable/interval_tree.rssrc/memtable/mod.rssrc/range.rssrc/range_tombstone.rssrc/range_tombstone_filter.rssrc/table/block/type.rssrc/table/inner.rssrc/table/iter.rssrc/table/mod.rssrc/table/multi_writer.rssrc/table/regions.rssrc/table/util.rssrc/table/writer/mod.rssrc/tree/mod.rssrc/version/mod.rstests/range_tombstone.rstests/tree_recovery_versions.rs
Summary
remove_range(start, end, seqno)andremove_prefix(prefix, seqno)API onAbstractTreeTechnical Details
Core types (ported algorithms from upstream PR fjall-rs#242, own SST format):
RangeTombstone— half-open[start, end)interval with seqnoActiveTombstoneSet/ActiveTombstoneSetReverse— sweep-line trackers for iterationIntervalTree— AVL-balanced interval tree in memtable for O(log n) queriesRangeTombstoneFilter— bidirectional wrapper that suppresses covered entriesRead path:
RangeTombstoneFilterwith sweep-line activation/expirykey_range, then linearly scan only candidate RT blocksWrite path:
key_rangewidened to include RT coverage so compaction overlap detection includes these tables.MultiWriterusesclip_range_tombstones=falsefor flushkey_rangesurvivesWriter::finish()/ rotation and remains sound for future overlap checks and point-read prefilteringoptimize_runs()only fuses truly disjoint tables back together, so RT-widened sibling tables do not violateRun::get_for_key()'s non-overlap assumptionintersect_opt().MultiWriterusesclip_range_tombstones=true(via.use_clip_range_tombstones())sort + dedup): flush, compaction input, and range iteration — prevents RT accumulation from MultiWriter rotationWeakTombstonesentinel uses the start key of the lowest-seqno tombstone (with start as a deterministic tiebreaker), whilemeta.first_key/meta.last_keystill covermin(rt.start)/max(rt.end)for compaction overlapVisibility / API:
AbstractTreeremains public as a consumer-facing bound, but is now sealed to crate-owned tree types; internal RT-aware flush hooks stay hidden implementation detailsAbstractTreeis an intentional public-API break for external implementers in this fork; we are not preserving extension-trait semver compatibility for downstream custom implementations, and the longer-term facade/internal split is tracked in Refactor AbstractTree into a public facade and internal engine trait #29is_key_suppressed_by_range_tombstone,range_tombstones_sorted,range_tombstones()) arepub(crate)— internal implementation detail, not part of public APIRangeTombstonetype ispub(crate)(module not publicly exported)SST format:
BlockType::RangeTombstone = 4[start_len:u16_le][start][end_len:u16_le][end][seqno:u64_le](repeated)"range_tombstones"section in table TOCstart < endinvariant enforced during decodingu16::try_from— oversized keys rejected withlog::warndiagnosticupper_bound_exclusivereturnsOption<UserKey>(Noneonly for the lexicographically maximal encodable user key)Disk Format Compatibility
This PR now targets disk format V4.
The format bump is intentional and covers the new persisted semantics introduced in the fork:
range_tombstonessection /BlockType::RangeTombstoneMergeOperandvalue kind in the internal value encoding on the merge-operator branchKeeping these changes under
V3would be unsafe:V3reader could open a database containing range tombstones, ignore the new SST section, and return keys that are logically deletedV3reader could not safely interpret persisted merge operands, because it does not implement merge-resolution semantics in reads and compactionBecause both features change the meaning of on-disk data rather than only extending in-memory APIs, they must share a new compatibility gate.
V4is therefore the correct boundary for this fork's new persisted semantics:V3databases remain readable by the new codeV3binariesKnown Limitations
Test Plan
remove_prefixAPI, including empty/all-0xFFsoft-reject edge caseskey_rangekeeps RT coverage after flush/finishV4written by new code, safeV3recovery still accepted, unsupported versions rejected on open)cargo test --all-features -- --nocaptureCloses #16
Summary by CodeRabbit
New Features
Behavior
Tests
Documentation